86 research outputs found

    Korean Genome Analysis

    Get PDF
    Department of Biomedical EngineeringA personal genome can now be analyzed very efficiently at a low cost by using high-throughput whole-genome sequencing technologies, along with fast and accurate computing methods such as machine learning. Very large-scale population genomics studies that extensively investigate the whole ethnic groups are now plausible. Ethnic genome data are crucial resources for mapping population-specific genomic patterns and identifying diseases- and phenotypes-associated variants for use in healthcare. Genomics data can also provide insight into the population histories of both ancient and modern ethnic groups. Although there have been several Korean personal and populational genomic studies in the past 20 years, very large-scale Korean population genomic data with matched phenotypes have not been made available. Further, the study of the origin and composition of the Korean population based on whole-genome study and multiomics data, have not been thoroughly studied. In this Ph.D. dissertation, I present my analysis of Korean genomes. In the first chapter, I present a variome set from the first phase of Genome Korea (Korea1K, 1094 Korean genomes) which is a subproject of the Korean Genome Project (KGP) and its usefulness. The Korea1K variome analysis showed that the Korean population is genetically highly homogenous compared to other East Asians. The Korea1K variome and its matched clinical traits data illustrated the significant advantage of using whole-genome sequences for genome- wide association studies, by identifying nine more significant candidate alleles than previously reported. As a reference variome panel for the population genomics, the Korea1K panel showed better imputation accuracy for Koreans than the commonly used 1,000 genome project panel (1KGP) of the United Kingdom. In the second chapter, I describe my investigation into the origin and genomic composition of the Korean population using 88 Korean whole-genome data accompanied by 208 worldwide and 115 ancient genomes from the various eras and spatial spectrums. This extensive comparative analysis suggested that the current genomic composition of Koreans may have been established through rapid admixture events between ancient southern Chinese associated with Bronze-Iron age Southeast Asians and existing Northern Asians around and inside of the Korean peninsula. I also speculated that the admixing trend initially occurred mainly outside the Korean peninsula, followed by continuous spread and localization within the Korean peninsula, which is consistent with the general admixture trend of East Asians in the Bronze and Iron ages that occurred about 4,500 years ago. The genomics composition of more than 70% of modern Koreans' is thought to be derived from the recent population expansion and admixture events from the South. In the third chapter, I introduce the first systematically produced Korean Genome Project portal and its open API system, which allows the variant frequencies and association results of Korea1K to be efficiently accessible. In conclusion, I present a large-scale Korean genome analysis, thereby showing the usefulness of constructing the population variome set. The Korean variome analysis, in combination with worldwide modern and ancient genomic resources, can also be used to explain the origin of Koreans.ope

    Design of Automation Environment for Analyzing Various IoT Malware

    Get PDF
    With the increasing proliferation of IoT systems, the security of IoT systems has become very important to individuals and businesses. IoT malware has been increasing exponentially since the emergence of Mirai in 2016. Because the IoT system environment is diverse, IoT malware also has various environments. In the case of existing analysis systems, there is no environment for dynamic analysis by running IoT malware of various architectures. It is inefficient in terms of time and cost to build an environment that analyzes malware one by one for analysis. The purpose of this paper is to improve the problems and limitations of the existing analysis system and provide an environment to analyze a large amount of IoT malware. Using existing open source analysis tools suitable for various IoT malicious codes and QEMU, a virtualization software, the environment in which the actual malicious code will run is built, and the library or system call that is actually called is statically and dynamically analyzed. In the text, the analysis system is applied to the actual collected malicious code to check whether it is analyzed and derive statistics. Information on the architecture of malicious code, attack method, command used, and access path can be checked, and this information can be used as a basis for malicious code detection research or classification research. The advantages are described of the system designed compared to the most commonly used automated analysis tools and improvements to existing limitations

    Gypsum-Dependent Effect of NaCl on Strength Enhancement of CaO-Activated Slag Binders

    Get PDF
    This study explores the combined effect of NaCl and gypsum on the strength of the CaO-activated ground-granulated blast furnace slag (GGBFS) binder system. In the CaO-activated GGBFS system, the incorporation of NaCl without gypsum did not improve the strength of the system. However, with the presence of gypsum, the use of NaCl yielded significantly greater strength than the use of either gypsum or NaCl alone. The presence of NaCl largely increases the solubility of gypsum in a solution, leading to a higher concentration of sulfate ions, which is essential for generating more and faster formations of ettringite in a fresh mixture of paste. The significant strength enhancement of gypsum was likely due to the accelerated and increased formation of ettringite, accompanied by more efficient filling of pores in the system

    Depression and suicide risk prediction models using blood-derived multi-omics data

    Get PDF
    More than 300 million people worldwide experience depression; annually, ~800,000 people die by suicide. Unfortunately, conventional interview-based diagnosis is insufficient to accurately predict a psychiatric status. We developed machine learning models to predict depression and suicide risk using blood methylome and transcriptome data from 56 suicide attempters (SAs), 39 patients with major depressive disorder (MDD), and 87 healthy controls. Our random forest classifiers showed accuracies of 92.6% in distinguishing SAs from MDD patients, 87.3% in distinguishing MDD patients from controls, and 86.7% in distinguishing SAs from controls. We also developed regression models for predicting psychiatric scales with R2 values of 0.961 and 0.943 for Hamilton Rating Scale for Depression???17 and Scale for Suicide Ideation, respectively. Multi-omics data were used to construct psychiatric status prediction models for improved mental health treatment

    Whole Genome Analysis of the Red-Crowned Crane Provides Insight into Avian Longevity

    Get PDF
    The red-crowned crane (Grus japonensis) is an endangered, large-bodied crane native to East Asia. It is a traditional symbol of longevity and its long lifespan has been confirmed both in captivity and in the wild. Lifespan in birds is known to be positively correlated with body size and negatively correlated with metabolic rate, though the genetic mechanisms for the red-crowned crane's long lifespan have not previously been investigated. Using whole genome sequencing and comparative evolutionary analyses against the grey-crowned crane and other avian genomes, including the long-lived common ostrich, we identified red-crowned crane candidate genes with known associations with longevity. Among these are positively selected genes in metabolism and immunity pathways (NDUFA5, NDUFA8, NUDT12, SOD3, CTH, RPA1, PHAX, HNMT, HS2ST1, PPCDC, PSTK CD8B, GP9, IL-9R, and PTPRC). Our analyses provide genetic evidence for low metabolic rate and longevity, accompanied by possible convergent adaptation signatures among distantly related large and long-lived birds. Finally, we identified low genetic diversity in the red-crowned crane, consistent with its listing as an endangered species, and this genome should provide a useful genetic resource for future conservation studies of this rare and iconic species

    KOREF_S1: phased, parental trio-binned Korean reference genome using long reads and Hi-C sequencing methods

    Get PDF
    Background KOREF is the Korean reference genome, which was constructed with various sequencing technologies including long reads, short reads, and optical mapping methods. It is also the first East Asian multiomic reference genome accompanied by extensive clinical information, time-series and multiomic data, and parental sequencing data. However, it was still not a chromosome-scale reference. Here, we updated the previous KOREF assembly to a new chromosome-level haploid assembly of KOREF, KOREF_S1v2.1. Oxford Nanopore Technologies (ONT) PromethION, Pacific Biosciences HiFi-CCS, and Hi-C technology were used to build the most accurate East Asian reference assembled so far. Results We produced 705 Gb ONT reads and 114 Gb Pacific Biosciences HiFi reads, and corrected ONT reads by Pacific Biosciences reads. The corrected ultra-long reads reached higher accuracy of 1.4% base errors than the previous KOREF_S1v1.0, which was mainly built with short reads. KOREF has parental genome information, and we successfully phased it using a trio-binning method, acquiring a near-complete haploid-assembly. The final assembly resulted in total length of 2.9 Gb with an N50 of 150 Mb, and the longest scaffold covered 97.3% of GRCh38's chromosome 2. In addition, the final assembly showed high base accuracy, with Conclusions KOREF_S1v2.1 is the first chromosome-scale haploid assembly of the Korean reference genome with high contiguity and accuracy. Our study provides useful resources of the Korean reference genome and demonstrates a new strategy of hybrid assembly that combines ONT's PromethION and PacBio's HiFi-CCS

    Regional TMPRSS2 V197M Allele Frequencies Are Correlated with COVID-19 Case Fatality Rates.

    Get PDF
    Coronavirus disease, COVID-19 (coronavirus disease 2019), caused by SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2), has a higher case fatality rate in European countries than in others, especially East Asian ones. One potential explanation for this regional difference is the diversity of the viral infection efficiency. Here, we analyzed the allele frequencies of a nonsynonymous variant rs12329760 (V197M) in the TMPRSS2 gene, a key enzyme essential for viral infection and found a significant association between the COVID-19 case fatality rate and the V197M allele frequencies, using over 200,000 present-day and ancient genomic samples. East Asian countries have higher V197M allele frequencies than other regions, including European countries which correlates to their lower case fatality rates. Structural and energy calculation analysis of the V197M amino acid change showed that it destabilizes the TMPRSS2 protein, possibly negatively affecting its ACE2 and viral spike protein processing

    Chromosome-scale assembly comparison of the Korean Reference Genome KOREF from PromethION and PacBio with Hi-C mapping information.

    Get PDF
    BACKGROUND:Long DNA reads produced by single-molecule and pore-based sequencers are more suitable for assembly and structural variation discovery than short-read DNA fragments. For de novo assembly, Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) are the favorite options. However, PacBio's SMRT sequencing is expensive for a full human genome assembly and costs more than $40,000 US for 30ร— coverage as of 2019. ONT PromethION sequencing, on the other hand, is 1/12 the price of PacBio for the same coverage. This study aimed to compare the cost-effectiveness of ONT PromethION and PacBio's SMRT sequencing in relation to the quality. FINDINGS:We performed whole-genome de novo assemblies and comparison to construct an improved version of KOREF, the Korean reference genome, using sequencing data produced by PromethION and PacBio. With PromethION, an assembly using sequenced reads with 64ร— coverage (193 Gb, 3 flowcell sequencing) resulted in 3,725 contigs with N50s of 16.7 Mb and a total genome length of 2.8 Gb. It was comparable to a KOREF assembly constructed using PacBio at 62ร— coverage (188 Gb, 2,695 contigs, and N50s of 17.9 Mb). When we applied Hi-C-derived long-range mapping data, an even higher quality assembly for the 64ร— coverage was achieved, resulting in 3,179 scaffolds with an N50 of 56.4 Mb. CONCLUSION:The pore-based PromethION approach provided a high-quality chromosome-scale human genome assembly at a low cost with long maximum contig and scaffold lengths and was more cost-effective than PacBio at comparable quality measurements

    Decoding a highly mixed Kazakh genome.

    Get PDF
    We provide a Kazakh whole genome sequence (MJS) and analyses with the largest comparative Kazakh genomic data available to date. We found 102,240 novel SNVs and a high level of heterozygosity. ADMIXTURE analysis confirmed a significant proportion of variations in this individual coming from all continents except Africa and Oceania. A principal component analysis showed neighboring Kalmyk, Uzbek, and Kyrgyz populations to have the strongest resemblance to the MJS genome which reflects fairly recent Kazakh history. MJS's mitochondrial haplogroup, J1c2, probably represents an early European and Near Eastern influence to Central Asia. This was also supported by the heterozygous SNPs associated with European phenotypic features and strikingly similar Kazakh ancestral composition inferred by ADMIXTURE. Admixture (f3) analysis showed that MJS's genomic signature is best described as a cross between the Neolithic East Asian (Devil's Gate1) and the Bronze Age European (Halberstadt_LBA1) components rather than a contemporary admixture
    • โ€ฆ
    corecore